Open R studio, open, and run vector-analysis-lab-setup.R to setup the lab, load the spatial datasets and run re-projections.
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(message = FALSE)
knitr::opts_chunk$set(warnings = FALSE)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.3 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 2.0.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
library(sf)
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
library(tmap)
setwd("c:/temp/class2421")
source("./functions/functions1c.R")
# read spatial datasets and re-project to wgs84 if needed
states = st_read("./gisdata/cb_2018_us_state_20m.shp") %>%
tolow() %>%
st_transform(4326) %>%
filter(!(stusps %in% c("AK", "HI", "PR")))
## Reading layer `cb_2018_us_state_20m' from data source
## `C:\temp\class2421\gisdata\cb_2018_us_state_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 52 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
counties = st_read("./gisdata/cb_2018_us_county_20m.shp") %>%
tolow() %>%
st_transform(4326)
## Reading layer `cb_2018_us_county_20m' from data source
## `C:\temp\class2421\gisdata\cb_2018_us_county_20m.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3220 features and 9 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
nercs = st_read("./gisdata/NercRegions_201907.shp") %>%
tolow()
## Reading layer `NercRegions_201907' from data source
## `C:\temp\class2421\gisdata\NercRegions_201907.shp' using driver `ESRI Shapefile'
## Simple feature collection with 8 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -124.7558 ymin: 24.51833 xmax: -66.954 ymax: 49.39724
## Geodetic CRS: WGS 84
tlines = st_read("./gisdata/HVTranLines.shp") %>%
tolow()
## Reading layer `HVTranLines' from data source
## `C:\temp\class2421\gisdata\HVTranLines.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3183 features and 17 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: -123.4648 ymin: 25.81335 xmax: -66.42408 ymax: 49.85894
## Geodetic CRS: WGS 84
plants = st_read("./gisdata/PowerPlants_US_202004.shp") %>%
tolow() %>%
filter(total_mw >= 100) %>%
filter(!(statename %in% c("Alaska", "Hawaii", "Puerto Rico")))
## Reading layer `PowerPlants_US_202004' from data source
## `C:\temp\class2421\gisdata\PowerPlants_US_202004.shp' using driver `ESRI Shapefile'
## Simple feature collection with 9943 features and 31 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -171.7124 ymin: 17.94712 xmax: -65.27956 ymax: 71.292
## Geodetic CRS: WGS 84
# define sf projection with OGC WKT human-readable text strings
# note the OGC WKT keywords shown in capitals
usalberswkt =
'PROJCS["USA_Contiguous_Albers_Equal_Area_Conic_USGS_version",
GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137.0,298.257222101]],
PRIMEM["Greenwich",0.0],
UNIT["Degree",0.0174532925199433]],
PROJECTION["Albers"],
PARAMETER["False_Easting",0.0],
PARAMETER["False_Northing",0.0],
PARAMETER["Central_Meridian",-96.0],
PARAMETER["Standard_Parallel_1",29.5],
PARAMETER["Standard_Parallel_2",45.5],
PARAMETER["Latitude_Of_Origin",23.0],
UNIT["Meter",1.0]]'
statesaea = states %>% st_transform(usalberswkt)
# defining a projection with a proj4 string
# note the + as the lead character for each parameter
utm16 = "+proj=utm +zone=16 +ellps=GRS80 +datum=NAD83 +units=m +no_defs"
statesutm16 = states %>% st_transform(utm16)
counties2 = counties %>% st_transform(usalberswkt)
nercs2 = nercs %>% st_transform(usalberswkt)
states2 = states %>% st_transform(usalberswkt)
tlines2 = tlines %>% st_transform(usalberswkt)
plants2 = plants %>% st_transform(usalberswkt)
An ordinary join matches records from two dataframes using columns of shared values. A spatial join matches features from two spatial dataframes using a geographic relationship between features; and it copies attributes from the joined spatial dataframe to the source spatial dataframe.
Assume we want to simply assign each US county to a NERC region. We could conduct a long, complex polygon overlay, but an alternate quick method is to convert county polygons to centroids, then conduct a spatial join to join NERC attributes to each centroid.
coucents2 = st_centroid(counties2)
## Warning in st_centroid.sf(counties2): st_centroid assumes attributes are
## constant over geometries of x
counerc2 = coucents2 %>% st_join(nercs2, join = st_intersects)
counerc3 = st_drop_geometry(counerc2) %>%
select(geoid,nerc,nerc_label)
counerc4 = counties %>% left_join(counerc3)
We’ll now plot the results, and plot a zoomed-in version for Texas. Use CTRL-C and CTRL-V to paste this text into your R script, and run each piece of it.
Run each command below
zoomx = function(sfin){
box = st_bbox(sfin)
xlimits = c(box$xmin,box$xmax)
return(xlimits) }
zoomy = function(sfin){
box = st_bbox(sfin)
ylimits = c(box$ymin,box$ymax)
return(ylimits) }
ggplot() +
geom_sf(data=states2) +
geom_sf(data=counerc4, aes(fill=nerc))
targstates = states2 %>% filter(name %in% c("Texas"))
ggplot() +
geom_sf(data=states2) +
geom_sf(data=counerc4, aes(fill=nerc)) +
geom_sf(data=states2,fill=NA,size=2) +
geom_sf(data=nercs2,fill=NA,color="purple") +
coord_sf(xlim=zoomx(targstates),ylim=zoomy(targstates))
Zooming to a map area is somewhat awkward in ggplot. You use an extra + and command of the form:
coord_sf(xlim = c(-20, 45), ylim = c(30, 73)
However, for any sf dataset you can extract the bounding box with a command of the form
box = st_bbox(states2)
which produces output of the form
xmin ymin xmax ymax
-2356113.7 268660.9 2258154.4 3165721.7
The spatial join operation has a number of different geographic relationships that can be used to join features. If you use st_join without any special arguments, the actual command issued includes the default geographic relationship, which is st_intersects. The command shown below is the one that is really executed when you type the counerc1 command in the sequence above.
counerc2 = coucents2 %>% st_join(nercs2, join = st_intersects)
You can substitute other geographic relationships, such as:
st_contains_properly
st_contains
st_covered_by
st_covers
st_crosses
st_disjoint
st_equals_exact
st_equals
st_is_within_distance
st_nearest_feature
st_overlaps st_touches
st_within
As discussed below in Task 6, st_intersects returns a set of TRUE/FALSE values as st_join tries to match each record in the first(input) dataframe to each record in the join (second) dataframe, using the specified (or default st_intersects) geographic relationship.
If the match attempt returns TRUE, an output record is created combining the attributes of the first dataframe and the attributes of the second dataframe.
Our next task is to identify areas within 10 miles (16.09 km) of a Georgia high voltage transmission line.
We will first clip the national tlines dataset to the Georgia boundary, project the tlines data into a projected coordinate system, buffer the lines, and re-project back to WGS84.
georgia2 = states2 %>% filter(name == "Georgia")
gatlines2 = tlines2 %>% st_intersection(georgia2)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
ggplot() +
geom_sf(data=georgia2) +
geom_sf(data=gatlines2,color="orange")
Check the ggplot to confirm that the intersection/clip has worked correctly.
We now create the buffer:
gatbuffer2 = gatlines2 %>% st_buffer(dist=16093.4)
ggplot() +
geom_sf(data=georgia2) +
geom_sf(data=gatbuffer2,color="green")
gatbuffer3 = st_union(gatbuffer2)
ggplot() +
geom_sf(data=georgia2) +
geom_sf(data=gatbuffer3,color="green")
gatbuffer4 = st_sf(gatbuffer3) %>%
mutate(buffdist = "16093.4 meters")
ggplot() +
geom_sf(data=georgia2) +
geom_sf(data=gatbuffer4,color="green")
We would now like to test whether most of Georgia’s large power plants fall inside the 10-mile buffer around major transmission lines.
We first pull Georgia’s plants from the national plant dataset.
The crucial operation is st_intersets(), which returns a matrix of TRUE or FALSE values depending on whether each object intersects another set of objects. The sparse=FALSE clause ensures that both TRUE and FALSE values are returned, rather than only TRUE values.
gaplants2 = plants2 %>% st_intersection(georgia2)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
neartlines2 = gaplants2 %>% st_intersects(gatbuffer4, sparse=FALSE)
neartlines3 = data.frame(neartlines2)
gaplants3 = gaplants2 %>% bind_cols(neartlines3)
ggplot() +
geom_sf(data=georgia2) +
geom_sf(data=gatbuffer4,color="green") +
geom_sf(data=gatlines2,color="orange") +
geom_sf(data=gaplants3,aes(color=neartlines2))
Our next spatial task: merging polygons that share an attribute value. We wish to generate a polygon dataset with one polygon for each of the nine Census divisions, each of which consists of a number of states. Georgia is in the South Atlantic division, which stretches from Georgia north to Maryland.
We are able to conduct the merge without using a special st command. Instead we can use the dplyr group_by and summarize commands. In one step we are able to both merge polygons and aggregate individual state numbers into division numbers.
We first read a dataset of state information. It includes a record for each state and several fields, include the states division code and division name.
stateinfo = read_excel("./data/stateinfo5.xls") %>%
rename(name = statename)
states3 = states2 %>% left_join(stateinfo)
divisions2 = states3 %>% group_by(divname) %>%
summarize(aland = sum(aland),
awater = sum(awater))
ggplot() +
geom_sf(data=divisions2,aes(fill=divname))
ggplot() +
geom_sf(data=divisions2,aes(fill=divname)) +
geom_sf(data=states2,alpha=0)
Our final spatial analysis task is classic polygon overlay. We will be overlaying state polygons and NERC polygons to generate small polygons that are “nested” within both states and NERCs. This will allow us, for example, to calculate the area of each state within each NERC.
We first simplify the attributes for each input layer, then invoke st_intersection() to generate the small polygon dataset.
Finally we plot the results with ggplot and with a tmap interactive map.
states4 = states2 %>%
select(stname=name, geoid)
nercs4 = nercs2 %>%
select(nercname = nerc)
overlay4 = nercs4 %>%
st_intersection(states4)
## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries
ggplot(data=overlay4) +
geom_sf(aes(fill=nercname))
tmap_mode("view")
tm_shape(overlay4) +
tm_polygons(col="nercname",
popup.vars = c("nercname","stname"))